Imitation in Reinforcement Learning

نویسندگان

  • Dana Dahlstrom
  • Eric Wiewiora
چکیده

The promise of imitation is to facilitate learning by allowing the learner to observe a teacher in action. Ideally this will lead to faster learning when the expert knows an optimal policy. Imitating a suboptimal teacher may slow learning, but it should not prevent the student from surpassing the teacher’s performance in the long run. Several researchers have looked at imitation in the context of reinforcement learning. Perhaps the most straightforward formulation is to apply a standard reinforcement learning algorithm such as Q-learning to the teacher’s experience rather than the learner’s [9, 2, 3, 8]. Price and Boutilier extend this approach to the case where the learner does not know what actions the teacher takes [6, 7]. They do assume, however, that the learner knows a priori its reward R(s) for a transition into any state s. Another way to incorporate expert information is shaping. Shaping is the introduction of small rewards into certain states where progress towards an environment reward is made. Shaping can speed up learning, but runs the risk of corrupting the underlying reward structure and thus changing which strategies are optimal. Ng et al. have shown that potential-based shaping functions preserve the partial ordering of policies with respect to optimality [4]. This means poor potential functions will at worst slow learning, but will not prevent convergence in the long run; potential functions which closely approximate states’ true values will speed up learning. An ideal potential function for imitation would be based on salient properties of the teacher’s policy. If the teacher consistently completes some set of subgoals in the process of receiving a reward, it would be beneficial to receive shaping rewards when the learner completes these. Ng is in the process of developing a way to extract an learner’s rewards from observations of its policy, but his method does not guarantee a reward function that makes a good shaping function: it favors larger sporadic rewards over smaller, frequent rewards [5]. In fact, Ng poses reverse engineering a shaping function as an open problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embodied imitation-enhanced reinforcement learning in multi-agent systems

Imitation is an example of social learning in which an individual observes and copies another’s actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared to other research that uses imitation with reinforcement learning, our method uses imitati...

متن کامل

Learning by Imitation, Reinforcement and Verbal Rules in Problem Solving Tasks

Learning by imitation is a powerful process for acquiring new knowledge, but there has been little research exploring imitation’s potential in the problem solving domain. Classical problem solving techniques tend to center around reinforcement learning, which requires significant trial-and-error learning to reach successful goals and problem solutions. Heuristics, hints, and reasoning by analog...

متن کامل

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Recent work has demonstrated that problems– particularly imitation learning and structured prediction– where a learner’s predictions influence the inputdistribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti...

متن کامل

Learning by imitation, by reinforcement and by verbal rules in problem solving

Learning by imitation is a powerful process for acquiring new knowledge, but there has been little research exploring imitation’s potential service to the problem-solving domain. Classical problem-solving techniques tend to center around reinforcement learning, which requires significant trial-and-error learning to reach successful goals and problem solutions. Heuristics, hints, and reasoning b...

متن کامل

A Bayesian Approach to Imitation in Reinforcement Learning

In multiagent environments, forms of social learning such as teaching and imitation have been shown to aid the transfer of knowledge from experts to learners in reinforcement learning (RL). We recast the problem of imitation in a Bayesian framework. Our Bayesian imitation model allows a learner to smoothly pool prior knowledge, data obtained through interaction with the environment, and informa...

متن کامل

A unified framework for imitation-like behaviors

In this paper, we combine the formal methods from reinforcement learning with the paradigm of imitation learning. The extension of the reinforcement learning framework to integrate the information provided by an expert (demonstrator) has the important advantage of allowing a clear decrease of the time necessary to learn certain robotic tasks. Hence, learning by imitation can be interpreted as a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002